Skip to content

Modify pipeline for whole transcriptome using HUGO names#4

Open
gadenbuie wants to merge 5 commits intomasterfrom
mayo-545
Open

Modify pipeline for whole transcriptome using HUGO names#4
gadenbuie wants to merge 5 commits intomasterfrom
mayo-545

Conversation

@gadenbuie
Copy link
Member

I modified the PAM50 pipeline to use HUGO names instead, pulling probeset-to-gene mappings from the latest HuEx-1_0-st-v2 Probeset Annotations (released 7/16/16). This file is behind a login but was downloaded from http://www.affymetrix.com/Auth/analysis/downloads/na36/wtexon/HuEx-1_0-st-v2.na36.hg19.probeset.csv.zip

The HUGO name lookup was performed with the HUGO gene symbol names curated at genenames.org and downloaded from http://beta.genenames.org/download/custom.

The SUMMARIZE_FUNCTION option variable can be used to set the function that summarizes gene expression when multiple probesets are mapped to a gene name. You can pass a summary function from base R here (e.g. median, mean, quantile), or you can write your own function as long as it takes a vector as it's first argument and produces a single value output.

The scripts gather GSE46691 files into the data subfolder. If you have previously downloaded these elsewhere copy or link them into this folder and the scripts will skip downloading.

Finally, I also realized that the previous script relies on biogroom which is currently still a private repo, so I imported functions as needed for the final step of extracting and cleaning the phenotype data.

@gadenbuie gadenbuie requested a review from tgerke July 30, 2018 17:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant